Event-Based Similarity Search and its Applications in Business Analytics

نویسنده

  • Martin Suntinger
چکیده

............................................................................................................................. 2 Table of contents ................................................................................................................ 3 1 Introduction ................................................................................................................ 6 1.1 Technological background ........................................................................................................................ 6 1.2 Objectives ................................................................................................................................................. 7 1.3 Data structure and data repository .......................................................................................................... 8 1.3.1 Single events ................................................................................................................................. 8 1.3.2 Event correlations ......................................................................................................................... 9 1.3.3 Database structure ..................................................................................................................... 10 1.4 The SENACTIVE EventAnalyzer ............................................................................................................ 11 1.5 General remarks ..................................................................................................................................... 12 2 Related work ............................................................................................................. 13 2.1 Similarity applications ............................................................................................................................. 13 2.2 Similarity models .................................................................................................................................... 13 2.3 Event sequence and attribute similarity ................................................................................................. 16 2.4 Time series similarity .............................................................................................................................. 16 2.5 Similarity pattern modeling and search interfaces ................................................................................. 18 3 Application examples and arising requirements ........................................................ 19 3.1 Finance ‐ market analysis and trading scenario discovery...................................................................... 19 3.1.1 Overview ..................................................................................................................................... 19 3.1.2 Similarity search example – trading scenarios ............................................................................ 19 3.1.3 Requirements for similarity searching ........................................................................................ 20 3.2 Online betting fraud detection – user behavior profiles ........................................................................ 20 3.2.1 Overview ..................................................................................................................................... 20 3.2.2 Similarity search example ........................................................................................................... 21 3.2.3 Requirements for similarity searching ........................................................................................ 22 3.3 Airport turnaround – detecting process deviations ............................................................................... 22 3.3.1 Overview ..................................................................................................................................... 22 3.3.2 Similarity search example ........................................................................................................... 23 3.3.3 Requirements for similarity searching ........................................................................................ 23 3.4 Other application areas .......................................................................................................................... 24 3.4.1 Supply‐chain/shipment processes .............................................................................................. 24 3.4.2 ITSM – Trouble‐ticket tracing ...................................................................................................... 24 3.4.3 Clickstream – Usage patterns ..................................................................................................... 24 4 Similarity assessment model ..................................................................................... 25 4.1 Summary of approach ............................................................................................................................ 25 4.1.1 A multi‐level similarity approach ................................................................................................ 26 4.1.2 Similarity versus distance............................................................................................................ 27 4.2 Single event similarity ............................................................................................................................. 28 4.2.1 Normalized absolute difference similarity .................................................................................. 28 4.2.2 Relative difference similarity ...................................................................................................... 29 4.2.3 String distance metric similarity ................................................................................................. 29 4.2.4 Lookup table similarity ................................................................................................................ 29 4.2.5 Boolean similarity ....................................................................................................................... 30 4.2.6 Multi‐value similarity .................................................................................................................. 30 4.2.7 Nested event similarity ............................................................................................................... 30 4.2.8 Attribute expression similarity .................................................................................................... 31 4 4.2.9 Generic similarity ........................................................................................................................ 31 4.2.10 Event level constraints ................................................................................................................ 31 4.3 Event sequence similarity ....................................................................................................................... 32 4.3.1 Overview and definitions ............................................................................................................ 32 4.3.2 Event type occurrence ................................................................................................................ 32 4.3.3 Occurrence times of events ........................................................................................................ 34 4.3.4 Numeric sequence similarity ...................................................................................................... 35 4.3.5 Event sequence level constraints blocks..................................................................................... 35 5 Similarity computation .............................................................................................. 41 5.1 The base algorithm ................................................................................................................................. 41 5.1.1 Finding the best solution: an assignment‐based approach ........................................................ 41 5.1.2 Implementation model ............................................................................................................... 41 5.2 Enhanced search pattern building blocks ............................................................................................... 44 5.2.1 Integration into the base algorithm ............................................................................................ 44 5.2.2 Restrictive blocks ........................................................................................................................ 47 5.2.3 Widening blocks .......................................................................................................................... 55 5.2.4 Asymptotic runtime .................................................................................................................... 66 5.3 Time series similarity for event attributes .............................................................................................. 68 5.3.1 Overview and requirements ....................................................................................................... 68 5.3.2 Applied time‐series similarity model .......................................................................................... 69 5.3.3 Asymptotic runtime .................................................................................................................... 81 5.3.4 Results and performance ............................................................................................................ 83 5.3.5 Integration into base similarity algorithm .................................................................................. 84 5.4 Generic similarity .................................................................................................................................... 90 6 Implementation ........................................................................................................ 91 6.1 Data and memory management ............................................................................................................. 91 6.1.1 Incremental load architecture .................................................................................................... 91 6.1.2 Bulk load architecture ................................................................................................................. 92 7 Providing similarity mining to the analyst ................................................................. 93 7.1 Overview ................................................................................................................................................. 93 7.2 User workflow for similarity mining ....................................................................................................... 93 7.2.1 Setting the base similarity configuration and similarity priorities .............................................. 93 7.2.2 Workflow model 1: Querying by example .................................................................................. 93 7.2.3 Workflow model 2: Building a search pattern ............................................................................ 94 7.3 Similarity search pattern modeling ......................................................................................................... 95 7.3.1 The similarity pattern editor ....................................................................................................... 96 7.4 Similarity search management ............................................................................................................... 98 7.5 Visualizing similarity search results ........................................................................................................ 98 7.5.1 Similarity ranking view ................................................................................................................ 98 7.5.2 Graphical view ............................................................................................................................ 99 8 Results and evaluation ............................................................................................. 101 8.1 Overview ............................................................................................................................................... 101 8.2 Case studies .......................................................................................................................................... 101 8.2.1 C1 Online gambling –user activity histories .............................................................................. 101 8.2.2 C2 Trouble tickets – change history sequences ........................................................................ 108 8.2.3 C3 Credit card transaction – sequences of purchases .............................................................. 115 8.2.4 C4 Algorithmic trading – trading scenario discovery ................................................................ 118 9 Summary, conclusions and future work ................................................................... 124 Appendix A – The STSimilarity library .............................................................................. 126

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Fuzzy TOPSIS Approach for Big Data Analytics Platform Selection

Big data sizes are constantly increasing. Big data analytics is where advanced analytic techniques are applied on big data sets. Analytics based on large data samples reveals and leverages business change. The popularity of big data analytics platforms, which are often available as open-source, has not remained unnoticed by big companies. Google uses MapReduce for PageRank and inverted indexes....

متن کامل

Enhancing Organizational Performance through Event-based Process Predictions

Enterprises in today’s globalized world are compelled to react on threats and opportunities in a highly flexible manner. Due to technological advancements, real-time information availability, especially in manufacturing operations, has reached new dimensions and increasingly provides Big Data. With Complex Event Processing (CEP) the required technology to analyze and correlate heterogeneous eve...

متن کامل

Big Data Analytics and Now-casting: A Comprehensive Model for Eventuality of Forecasting and Predictive Policies of Policy-making Institutions

The ability of now-casting and eventuality is the most crucial and vital achievement of big data analytics in the area of policy-making. To recognize the trends and to render a real image of the current condition and alarming immediate indicators, the significance and the specific positions of big data in policy-making are undeniable. Moreover, the requirement for policy-making institutions to ...

متن کامل

Traffic congestion control using Smartphone sensors based on IoT Technology

Traffic congestion in road networks is one of the main issues to be addressed, also vehicle traffic congestion and monitoring has become one of the critical issues in road transport. With the help of Intelligent Transportation System (ITS), current information of traffic can be used by control room to improve the traffic efficiency. The suggested system utilize technologies for real-time collect...

متن کامل

On the Distinction between Truthful, Invisible, False and Unobserved Events An Event Existence Classification Framework and the Impact on Business Process Analytics Related Research Areas

In this paper we present an event existence classification framework based on five business criteria. As a result we are able to distinguish thirteen event types distributed over four categories, i.e. truthful, invisible, false and unobserved events. Currently, several of these event types are not commonly dealt with in business process analytics research. Based on the proposed framework we sit...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009